-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUILD-475: Build Inventory and PipelineRun Controller #4
Conversation
ff9a218
to
4b0702b
Compare
/retitle BUILD-475: Build Inventory and PipelineRun Controller |
/assign @adambkaplan /cc @coreydaley |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are only two things that need to be changed here - the domain, and that bit about co-mingling test code. Otherwise let's merge this and get the project rolling!
PROJECT
Outdated
@@ -0,0 +1,7 @@ | |||
--- | |||
projectName: triggers | |||
domain: triggers.shipwright-io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - domain should be triggers.shipright.io
. I imagine this has a downstream impact on the generated CRDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I haven't noticed the typo in the domain
attribute. Fixing it.
pkg/filter/pipelinerun.go
Outdated
// searchBuildRunForRunOwner inspect the object owners for Tekton Run and returns it, otherwise nil. | ||
func searchBuildRunForRunOwner(br *v1alpha1.BuildRun) *types.NamespacedName { | ||
for _, ownerRef := range br.OwnerReferences { | ||
if ownerRef.APIVersion == stubs.TektonAPIv1alpha1 && ownerRef.Kind == "Run" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to keep note of - I believe more recent versions of Tekton have a beat version of the Run
object.
This should be filed as an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's a good point, Adam. Currently the Run
resource is still in v1alpha1
version, a new issue has been filed (#7).
pkg/filter/pipelinerun.go
Outdated
"time" | ||
|
||
"github.com/shipwright-io/build/pkg/apis/build/v1alpha1" | ||
"github.com/shipwright-io/triggers/test/stubs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are co-mingling test code in "production" code. Can you try moving this info (which look like constants?) here or in a separate shared package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I exported the constants to a dedicated package, please consider the latest changes.
// Inventory keeps track of Build object details, on which it can find objects that match the | ||
// repository URL and trigger rules. | ||
type Inventory struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a bad thing to have for an initial proof of concept. I am concerned about scalability if we are keeping data in memory for search/lookup purposes. I think we need a roadmap item for scaling the Inventory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to address scalability on demand. A regular Kubernetes informer is keeping objects in memory as well, that's the traditional design.
The component Inventory only stores the attributes it needs to perform the search queries, it should be lighter/faster than a regular informer. However, the search algorithm is naive, checking each entry, if it becomes a bottleneck -- i.e. having too may entries in cache -- we will discuss how to improve the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. But a lot. I am not yet through.
import ( | ||
"context" | ||
|
||
"github.com/shipwright-io/build/pkg/apis/build/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesting to use an alias here. This will make it easier to replace it with a beta import in the future.
"github.com/shipwright-io/build/pkg/apis/build/v1alpha1" | |
build "github.com/shipwright-io/build/pkg/apis/build/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's tackle this on #9.
"github.com/shipwright-io/triggers/test/stubs" | ||
|
||
"github.com/go-logr/logr" | ||
tknv1beta1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above (and I know we have never discussed this), but given we know that Tekton moves to v1 soon, I suggest to use an alias like this:
tknv1beta1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1" | |
pipeline "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's tackle this on #9
br := v1alpha1.BuildRun{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Namespace: pipelineRun.GetNamespace(), | ||
GenerateName: fmt.Sprintf("%s-", pipelineRun.GetName()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My personal opinion here, I tend to think that we should take the Build name here. But maybe we need to go through the scenario to decide what is better. And maybe we also come to the conclusion that we want this to be specified in the Build's trigger definitions.
GenerateName: fmt.Sprintf("%s-", pipelineRun.GetName()), | |
GenerateName: fmt.Sprintf("%s-", buildName), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense, Sascha. Taking in consideration trigger rules are defined in the Build
object itself, it makes sense to follow its name. I'll make the change 👍
Labels: map[string]string{ | ||
filter.OwnedByPipelineRunLabelKey: pipelineRun.GetName(), | ||
}, | ||
OwnerReferences: []metav1.OwnerReference{{ | ||
APIVersion: stubs.TektonAPIv1beta1, | ||
Kind: "PipelineRun", | ||
Name: pipelineRun.GetName(), | ||
UID: pipelineRun.GetUID(), | ||
}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. The scenarios imo can be very different and whether there should be that strong relationship, I am not sure. I would though expect to see above for the custom task integration where the BuildRun really becomes part of the PipelineRun. The trigger on pipelinerun status is imo more loosly coupling these two artifacts. We should still have a label, but that should maybe be "triggered-by" rather than an ownership. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a initial approach, I think the strong relationship makes sense because it keeps the cluster clean, when the PipelineRun
gets removed the objects triggered will also get removed.
The rationale is based on the fact that both Shipwright and Tekton produce ephemeral objects, these are only useful when the build process is running and afterwards should not leave instances behind.
// if the label recording the original PipelineRun name, which triggered builds, matches the | ||
// current object name it gets skipped from the rest of the syncing process | ||
if !filter.PipelineRunNotSyncedAndNotCustomTask(&pipelineRun) { | ||
logger.V(0).Info("Skipping due to either already synced or part of a custom-task") | ||
return ctrl.Result{}, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one could also be moved to the predicate function ?
Beside that I tried to follow the code path. Are you here trying to prevent the re-evaluation of a PipelineRun for which BuildRuns were already created ? I was looking for something in that direction because a controller restart would reconcile all PipelineRuns and we need to somehow determine what we must trigger and what was already triggered. But not sure if your approach works. Different Builds can have triggers on the same Pipeline but with different status for example. Once the first status is reached, it will trigger BuildRun(s) and set the label. How will we then trigger additional BuildRuns once another status is reached for which other triggers exist ?
We maybe need to somehow remember which status of the PipelineRun we already synced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one could also be moved to the predicate function ?
Sure! Please consider the latest changes.
Beside that I tried to follow the code path. Are you here trying to prevent the re-evaluation of a PipelineRun for which BuildRuns were already created ?
Yes, that's exactly what the controller does.
I was looking for something in that direction because a controller restart would reconcile all PipelineRuns and we need to somehow determine what we must trigger and what was already triggered. But not sure if your approach works.
Well, I've tested this approach it does prevent reprocessing PipelineRun
instances.
Different Builds can have triggers on the same Pipeline but with different status for example. Once the first status is reached, it will trigger BuildRun(s) and set the label. How will we then trigger additional BuildRuns once another status is reached for which other triggers exist ?
Indeed, that's a use-case the current approach does not cover. I think we might need to restructure the labels to point out which Build
names have been triggered.
We maybe need to somehow remember which status of the PipelineRun we already synced?
I think we might need to know which Build
s have already being triggered by that PipelineRun
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider the latest changes, now we rely on a annotation to register previously triggered builds to skip repeated events.
if labels == nil { | ||
labels = map[string]string{} | ||
} | ||
labels[filter.BuildRunsCreatedKey] = strings.Join(created, ", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Label values have a fairly small length limitation that you will hit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the length limitation? Could we store that information in the PipelineRun status itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 63 characters, https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#syntax-and-character-set. Assuming we do not want to query this data, then we should probably use annotations instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider the last changes, it's using annotations instead of labels. Thanks!
Thanks for the review! Please consider the latest changes 👍 |
/assign @jkhelil Please consider the Inventory and the controller reflecting the |
@otaviof: GitHub didn't allow me to assign the following users: jkhelil. Note that only shipwright-io members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: otaviof The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f2f1784
to
4161f8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls check
Annotations: map[string]string{ | ||
filter.OwnedByTektonPipelineRun: pipelineRun.GetName(), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a label to be able to query BuildRuns created for a PipelineRun. Length is no problem here as object names in k8s can also only be 63 characters long.
Annotations: map[string]string{ | |
filter.OwnedByTektonPipelineRun: pipelineRun.GetName(), | |
}, | |
Labels: map[string]string{ | |
filter.OwnedByTektonPipelineRun: pipelineRun.GetName(), | |
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BuildRun
instances issued are labeled, but the meta-information for filtering repeated events is annotated. The line you highlight is meant for meta-information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snippet refers to the annotation added to work in combination with "triggered-builds". Both are used to filter PipelineRun
instances that have already been processed.
logger.V(0).Info("PipelineRun previously triggered builds", "triggered-builds", triggeredBuilds) | ||
|
||
// firing the BuildRun instances for the informed Builds | ||
buildRunsIssued, err := r.issueBuildRunsForPipelineRun(ctx, &pipelineRun, buildNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can here get an error together with a non-empty buildRunsIssued array. The current code will requeue the reconcile and that imo triggers the same Build again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a number of integration test cases making sure repeated BuildRun
aren't issued by accident, so before reaching this part all the filtering took place. Could you try to test this logic and see if you can bypass the filtering? Please consider the integration tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, however that's and unlikely event the Kubernetes API-Sever errors out in a single object. Currently, we are logging the BuildRun
instances issued when this error occur, so we can assess later on.
In short, the controller can only react to the current context, and when the API-Server errors out I think the controller should retry.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will become relevant once we have a webhook that is taking over validations from the build controller and potentially rejects an object creation.
Imagine three builds with the same trigger where the first one gets successfully created but the second one fails on creation due to for example a missing parameter value.
We should then "commit" what was created successfully before returning the error for a retry.
For the time being, I agree it is requiring unexpected errors like API server blibs or similar.
pkg/filter/label.go
Outdated
pipelineRunBuildRunsIssued = strings.Split(label, ",") | ||
} | ||
pipelineRunBuildRunsIssued = append(pipelineRunBuildRunsIssued, buildRunsIssued...) | ||
labels[BuildRunsCreated] = strings.Join(pipelineRunBuildRunsIssued, ",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This label value will exceed 63 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I amended the logic to only allow 63 characters, please consider the latest changes.
filter.PipelineRunAnnotateName(&pipelineRun) | ||
|
||
// patching the PipelineRun to reflect labels and annotations needed on the object | ||
if err = r.Client.Patch(ctx, &pipelineRun, client.Merge); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find out how this patch works. Browsing through the code it reads is if it takes the state of the object and patches it fully which means that it overwrites changes.
To verify one would need to go something like this:
obj1 = client.get("id")
obj2 = client.get("id")
obj1.field1 = "newValue1"
client.update(obj1)
obj2.field2 = "newValue2"
client.patch(obj2, client.Merge)
What will be the value of field1 ?
Should we maybe do this:
obj = client.get("id")
originalObj = obj.DeepClone()
obj.field = "newValue"
client.patch(obj, client.MergeFrom(originalObj))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, let me adopt your suggestion. Please consider the latest changes.
For(&tknv1beta1.PipelineRun{}). | ||
WithEventFilter(predicate.NewPredicateFuncs(filter.EventFilterPredicate)). | ||
WithEventFilter(predicate.Funcs{ | ||
DeleteFunc: func(e event.DeleteEvent) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we filter here also the UpdateFunc ? I think we only are interested in updates where certain fields of the status are updated: startTime, conditions[type=Succeeded].Status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already filter the objects, it only allows the objects with the interesting fields populated. Please check the EventFilterPredicate
.
case pipelineRun.IsCancelled(): | ||
return tknv1beta1.PipelineRunReasonCancelled.String(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tekton function is imo weird. It checks the spec value (which I would label IsCancelRequested) but not the status that it really has been canceled successfully. I'd prefer if we trigger based on the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach relies on the Tekton logic directly, I think we should stick with the logic provided by their package directly.
return tknv1beta1.PipelineRunReasonFailed.String(), nil | ||
case pipelineRun.IsCancelled(): | ||
return tknv1beta1.PipelineRunReasonCancelled.String(), nil | ||
case pipelineRun.HasTimedOut(ctx, clock.NewFakePassiveClock(now)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above, this causes a calculation to happen but does not check whether the status of the PipelineRun really is representing a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, taking in consideration my last comment, I think we need to rely on what Tekton provides. Do you have a suggestion of an alternative path using what their package provides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to look at the condition with type=Succeeded
. For final states such as canceled, or timed out, we should check that status
is "False"
and check for the expected reason. They are documented here: https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#monitoring-execution-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not necessarily rely on their functions as I personally consider them less API-stable than the documented reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not necessarily rely on their functions as I personally consider them less API-stable than the documented reasons.
At least on the initial project state, I think we should follow Tekton's upstream and improve that part as we need. Right from the start I'm reluctant to deviate from Tekton upstream, so let's add a new issue and follow up in the next interactions on this project?
|
||
import "testing" | ||
|
||
func TestSanitizeURL(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you check SSH URLs like [email protected]:shipwright-io/build.git
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I've added the support for the git
scheme, please consider the latest changes.
afa5c24
to
d546d49
Compare
Redering namespace entry in the chart templates, importing generated manifest to compose the cluster-role/role and respective binding.
Testing the controllers with envtest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
logger.V(0).Info("PipelineRun previously triggered builds", "triggered-builds", triggeredBuilds) | ||
|
||
// firing the BuildRun instances for the informed Builds | ||
buildRunsIssued, err := r.issueBuildRunsForPipelineRun(ctx, &pipelineRun, buildNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will become relevant once we have a webhook that is taking over validations from the build controller and potentially rejects an object creation.
Imagine three builds with the same trigger where the first one gets successfully created but the second one fails on creation due to for example a missing parameter value.
We should then "commit" what was created successfully before returning the error for a retry.
For the time being, I agree it is requiring unexpected errors like API server blibs or similar.
Changes
Controller Manager (
kubebuilder
)The
main.go
uses the Controller-Manager to setup and run the controllers, as well as intercepting commandline flags and exposing health-probe endpoints.Controllers
Build Inventory
The Build Inventory is watching over Build instances with triggers in order to prepare the data to be searched over with
SearchForGit
andSearchForObjectRef
.The Inventory is a central component shared between the controllers.
Tekton Pipeline (
PipelineRun
)The Tekton Pipeline controller is watching over
PipelineRun
instances and searching the inventory for Builds to be triggered, given the search criteria matches the Inventory contents.Furthermore, the controller needs to skip instances that are part of Custom-Tasks and instances that have already been processed, in case of updates. Those scenarios are covered by integration tests.
Testing
Integration testing and unit-testing are covering the changes in this pull-request. The integration tests follow
kubebuilder
convention, using GinkGo andenvtest
.However, during development
envtest
proven to be difficult to install external CRDs aside of proding the YAML definition files directly. TheMakefile
targetdownload-crds
takes care of loading Shipwright Build and Tekton Pipeline CRDs.To run the tests use:
Project Infrastructure
Helm Chart
The Helm Chart is adapted to use the
ClusterRole
manifest genearated bycontroller-gen
, the Chart reads and embeds therules
into a templated resource that follows Helm conventions, like for instance a common set of labels.With the modifications, it's possible now to
helm install
andhelm uninstall
Shipwright Triggers, and alsohelm upgrade
.The Chart is also employed to deploy the application in combination with
ko
. For example, please consider the parameters from my workstation:Which produces
ghcr.io/otaviof/trigger:latest
and rolls out the manifests on "shipwright-build" namespace.Makefile
Additional targets have been added on the
Makefile
followingkubebuilder
standards, these carry a comment to explain what's the target goal and details.Documentation
Documentation is covering the major components and some of the main targets for development and testing.
Submitter Checklist
Release Notes